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SQL-BASED ANALYTIC ALGORITHMS 

CROSS-REFERENCE TO PFT AinED AP PLICATIONS 
This application claims the benefit under 35 U.S.C. Section 119(e) of the co- 
pending and commonly-assigned U.S. provisional patent application Serial No. 
60/102,831, filed October 2, 1998. by Timothy E. Miller, Brian D. Tate, James D. 
Hildreth, Miriam H. Herman, Todd M. Brye, and James E. Pricer, entided 
Teradata Scalable Discovery, which application is incorporated by reference herein. 

This application is also related to the following co-pending and commonly- 
assigned utility patent applications: 

Application Serial No. --/---,---, filed on same date herewith, by 
Brian D. Tate, James E. Pricer, Tej Anand, and Randy G. Kerber, entided 
SQL-Based Analytic Algorithm for Association, attorney's docket number 
8219, 

Application Serial No. --/---,---, filed on same date herewith, by 
James D. HUdreth. enutled SQL-Based Analytic Algorithm for Clustering, 
attorney's docket number 8220, 

Application Serial No. --/---,- - -, filed on same date herewith, by 
Todd M. Brye, entitled SQL-Based Analytic Algorithm for Rule Induction, 
attorney's docket number 8221, 

Application Serial No. --/---,---, filed on same date herewith, by 
Brian D. fatei entitled SQL^Ba^^^^^ 'r 
Derivation '/^ist; art'orhfey's^fi^^^ 8^2, 
ii .X ; , • iv.jrj^|j^iigji^cjii Serial^N©; > *ir, - ^ filed,pnr$aine .date.her^with^ by. 
Brian D. Tate, entitled SQIsTBased Automated^,, Adaptive, Histogram Bin 
Data Description Assist, attorney's docket ntiiititjer 8223, 
" Application Serial m>. PCT/US9?/ j -. - - filfii on same date ^ 
herewith, by fimathy.E. MUler, Brian D. Tiif:e,:Miriam H. Herman, Todd 
M. Brye, and Anthony L. Rollins, entitled Dat^Miniiig Assists in a 
Relational Database Ivlanagement System, atfdjrney's docket number 8224, 
Application Serial No. --/---,-- -, fil^d on sa^e date herewith, by 
Todd M. Brye, Brian D. Tate, and Anthony L. Rollins, entitled SQLrBased 
Data Reduction Techniques for Delivering Pata to Analytic Tools, , 
attorney's docket number 8225, . V^ 
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'■■ ' Applkation Serial;N9,,PCT/,lIS99^ 
' '■■ hdrewithj-by Timothy E% ]VUller,;,Mariam H.,H^^^ 
; Lv, uRoliins,- ekitl'ed.TecKnicJuestfpr 
^' attbmey's docket number 8226, and r, ;-^^^^^^ . : • . ; ., 
■■ r i ( Application SenalNPuEGT/yS99A 
' ^ herewitH by Timothy E. MiUer, Brian D,.T^^^^ 

" entiilted Analyticiogical'Data Model, attoxney'f , docket pumber 8227, 
- ■ aU of whifch'ai^ m^^^ ■ > ' , : 

' V i c ' RA!ftTCnR0,T TMp OF.THE INVENTION , . . 

•••■■'■Field of the Inventionv;^ it i'r ;j:.-.r. , . .-..f •■ 
- ' This invention^relates inge»eral,tfj|^ relarion4 c^to^^^ i^anagemen^ 
system, and in particula*,'^tOiSQLtbased smalytic algQi;ith^^ 

and nikchine learning methods to i:re^,e^4ytic mpde^iro^n thf, data residing in a 
"i-elatioii'al'databaseV'-- 't-" y.- f ': ■ ? vL''. .•.■;.•.■;,■ , ■.>.. ;.-• ■- ■ .^.•.,-, ..r--i-i;. ;■■ .. 



i- r: „;-2^. v ©esCTiption- oif Rckted Ai?>) V, , ■,. 

' ■'■ Relational databases are ithe, predominate |orin of d^tabg^e,management 
' systeriis used iii cdmputeK systems. ; Refetipn^.dj^tajj^se p^anagemetit systems are 
often used iii sb^-aUed Mata^warehpus6"^appUcatiofls,^here e^^ of 
data are stored and' processed. In. r?cent,yeai?, sey^raL^M^^ 
creatk a new 61ass of data warehoiising Applications , knp^n a^ d^^ mining 
■ -kpiilicatiohs; 'Daia^inii^ process pf id^ntjfjdng an^d^i^ter^^ in 

d^atabibesj'iand'cah be^ f js:, , i-, ., 

' ' ' - Stage oiie is the reponingscagej ^ckm>!^.^\'^;^ .^f'^^^ "^^^^ 
happened. "G^rieraliy,-m6st.data wehpus^ implementation sjt^wjA a fo 
application in a specific functional arga pf the Jju^ess. These application usually 
focus on reportirigihistorical snap shots of business information that was previously 
difficuk or impossible to access.- EKaraples.include .Sales Revenue Repprting, 
Production Reporting and Inventory Repoijing to name a few.. , 

Stage two is the analyzing stage, which analyzes the data to determine why 
it happened. iAs stage'one'ind-tisers gaii^ previously unseen views of their business, 
they qmckiy seek to^understand why certain events occurred; for example a decline 
in'sjtles^revenue: ' After rdiscovering a reported decline in; sales, data warehouse users 
wiU the'n' obviously ask, "Why did sales go dqwri?", Lesy-ning the answer to this 
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question typically inVblves protiii^^^^^ through aiiit.erative series of ad 

fi'bc or iiiultiaifeieiisioBai qUerieis tmtil die root causexof the condition is discovered. 
'Ei^pftk include Sal4 

Stage three is the predicting^istagei which tries.to' determine. wh^^ will 
happen. 'A^ stage two users bfeome riibre!sbphisticated,.they begin to extend their 
"an^ysis to include prediction of unknbwri/events. . For examplej ^Wlw end-users 
are likely to buy 'a paiticulaf proiduet^i or *Whb is at risk of leaving for the 
competition?" It is diffictdt forhutoans to «ee or. interparetisujbtle Relationships in 
data, hence as data warehouse users evolve to sophisticated predictive analysis they 
soon reach th'^ liniits 6f ti^iohat qiery ai^^^^^^^ Data mining helps 

end-users break through these limitations by lever^g intelligent software tools to 
shift somfe dt the^^ysis burden^fi^ih the fattman^to the machine, enabling the 

discovery bf reiation^^^^^ ' :. ' > ' •. 

' ' "ilaiiy tktaWnirig techno^^^^ 
solutions to complete tool suites. Most of these technologies,:^oveyer,,are used in 
a desktop environment where little data is captured and maintained. Therefore, 
most data mining tools are used to-ailalyze smdl data samples, which were gathered 
"^roin varioXis soixrcds intb pfopristaiy--' data structures or flat £1^, lOn the other 
haiid, orgidiialidfis are'beginhiiig^td a&ass very large databases apd en^-users are 

;'asfeng inbi-e '&rriple^"q^ to ;these large (ktabases,.. , 

' ^ ' -'tjAlortukktel^^^^^ caiMiot:be,used,.with large 

voium« 6f clati. •'Fui^Her,- i^st' analytical -techniques Jised in data .Riimng ?re , 

' aigbnt^mic-l>i^^^^ arid as. swh^ there ^ccurret^tljr little 

synergy between data miffifig aifd^lk»waf-eh6Wes..:-M^ fi:oiji.a.ys^bUity 
persp^eti^^.'^^ditioha tfata- iffiniif| tfegh&^ues are toxi: complex i(^r:use by database 

^'administratbrs kiid ippUciti'bh ^ofgi^aattmeK, and^jtre.^dio. cU^ for a 

■'''^ifferen^'mdiistryoradH^^ ^ n- i..::; - 

' 'thus, there is a n^din4HiB drf for^data mining applicatioiw tha^ dirertly 

^ operate against data waiyhouses?lbd that allow, nqn-stati^ticians to Jjenefi^ from 
advanced mathematical techniques ivailable in a relational.en's^r.onment. 

'"' ' ■ " '. i v. r; r-ivL", ij. ... .. >•,.-,, r;-;,,.. .. ■' 

' SUMMARY- OE THE-INV ENTION ' .. . 
To" overcome the limitatiolis in thoprior art described above, an^ to 
overcome other limitations that will become apparent upon reading . ^d, 
understanding thfe'^reserit specifitatlon, the present inventtpa disi^osesi^ method, 
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^apparatus^ and arciGlerpf manufacture fop pprfprnung data m in a 

■ r^ational cktabase^management $ystem-. At lea^t one analytic a] gorithni is 
i performed.by a coniputer;directH^ain^ the analytic 

algorithm includes SQL statements performeQ by ^the relational database 
'] management system and optional^ programmatic. iteration, and the analytic 
algorithm createsv at least one eriaJytif. model within ^ fndytic logical data model 
from data residing in the relationsJ ^ , , . ^ , , , , 

An object of the present invention is to proyide^mpre effi ^^S*^ 
parallel processor computer systems. An object^jQ^ t|^e pre^ invention is to 
provide a foundation for data niimng tool set^ ii^ rje^latipnal database 
systems. Further, -an object pf the prese^nt inyeiitionL, is to allow data mining of large 
databases. .hho/' ?ir^'h ■ ^ n -r.i ,--^1 ■ 

BRIEF DESCRIPTION OF THE D RAWINGS 
Referring hov/ to the drawings i^iwbich like reference niinibers represent 

corresponding parts throughqiit:: v; - -ov- i :j i r 

FIG. 1 is a block diagram tliat jl}iJ?t rates aii exemplary computer hardware 

environment that could be used. ^itli the prefer;red pinbpdin^^ of the present 

invention;. 0 'Vt: i • <.i ■ v;n\'^/a: ^:r.b ■ .-f, 

FIG. 2 is a block diagram that illiistrates an ^x^mpl^ logical architecture 
that coiild be used; with the preferr?^! er^bpclment invention; and 

FIGS. 3, 4, and 5 are flowcharts that illustrate exemfjjary logic performed 
according to the preferred embodjign^nt pf^^ie .pt^s^M mveniicion. 

DETAILED DESCRIPtT ION^QR-TI^. PR ^^^ EMBODIMENT 
In the foUowiijg description of tie^ reference is made 

to the accompanying drawings. twhiqh;fo hereof, and in which is shown 

by way of illustration a speeific embodiment in which the invention may be 
practiced. It is to be understood that^^Qtfeer en^bodinae^ts may be utilized and 
structural changes jmay be made i^ithiout departing fromi the scope of the present 
invention;'. -^t^'- .^icairc ± •? ^ . 

OVERVIEW V 
The present invention provides a relational database management system 
(RDBMS) that supports data mining operations of relational databases. In essence. 



advan^ea anil5^l^^^^ daplbiiiti^ f6t dak mmihgajpfUc^tions are placed 

v^rijji^hdbn^ii^i^^ iiMi dataL'-Morebv6rv,the results of these, analytic 
proc^g capibiiitiei tiA'b^^ l^k^ tSp^ist databaseiar can be exported 

from the database.' these'knal^^ic 'processing ^ and their results ?re 

5 exppsii externally to the RDBMS brjr ah Application programmable inferface. (API). 
' Accordkg to the prfe^erred emb'ddinient, the data mining process is an 
iterative approach referred to as a "Knowledge Discovery ■Anaytk Process" . 
(KDAP) There We sixinaj ■ ' 

i. Unciei^taiEidMg'the bminess objedtive^^^^^^ 

lb ' 2/^ ' U^^ 

3 - Selecting the data siet aftd "p^^^ 

4. Designing the analytic model. 

5. Creating and testing the models. 

6. • Depld^ihg^the akal^^^G^^^ 

15 ' ¥he present^ihv^ntioA^rbkdes for acdressingMiese tasks: 

• An KDBMS that executes Structured Query Language <SQL) 
. ' > ■ stateriieiits agamSt a' reia^ ' ■? 

- "^ i ■ An ahii^ie A'ppiicacibrl Pi^dgrajaimiiig Interface (API) that^^^^^ 
scalable data mining functions comprised of complex SQL , 



^AppH^tibri prograiris f hat ted and parameterize the analytic 

o •'■• '^iiaijH&^alldHthlA^Miliz^ - / ;,b^n r ■:■ • :;:-;y,.. - . 
■ Extended ANSI SQL statements, 
^^^Mk'\L^^::^^M&.il:^ ::^^^i^ii^iMsSkce:pX3^ staterments 
■T. ■s.:u::i^ty: ,;,<r. ^ ■ ■' '■^^ pt^^ •,-t../':.?f -X 

V i .;,-; c- i:j A: p^ja Rgdiicitidh'ljtaitjrPro^ 

::<:' Y; " .i ^ . stkemfehts ahd^prdgramifiatic kera^^ I 

■ ' " 6 ' An analytical logica''data iiiediel (LDM) that stores results from and 
^0 ' ' ' ' * inforination ibdut the advance analytic^ processing in , the_ RDBMS. 

• A parallel deployer that controls parallel execution of the results of 
the analytic algorithms that are stored in the analytic logical data 
model. ■'- - - 
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f- The benefits of the present in ventios^^^^m^ ; ror , , > 'i: . . . i 
• Data mining of very large databases 4irectly \^^ 
j.-:,.:o-: ... • '.- database:- ••' .-.•iir;; iii.v j-,- .r.l I. ': 'r,'-..-!"; 
c; '. rx ; v Managemeau of analytic, results within^ a relational dat^b^^. , 

» A c<jrnprehensive set oi ac^ytjc operations th^t operate wkhin a 
^ relational database management : - i : - 

.' : Vippli<iatipn integratioh^rough ^jpjbject-orien|ed API-r 
These components and benefits are described in more dptsiX below. 

' iTjj^^ARnWARE.iEN ymONMENT 

FIG. 1 is a bloek diagram that illustrates an, exemplary comppter hardware 

' enVifohnient that could be uied with the ipreferred^mbodiment of ^he present 

invention. In the eixemplaiy cPn^utecihardware enyir9Ament, ^ m?ssiyely parallel 

processing (MPP) cbmpiiter system '100 is cpf^prisedpfj^ne or moreprocessors or 

nodes 102 interconnected by a network 104. Each of the nodes 102 i^ cqniprised of 

one of more processors, random access TOe^pry ^RAl^, read-only memory 

(ROM), ^d other c6mpottents. Itis enisisioned that a?:tached to the nodes 102 may 

be one oi^^^riidre fixed' ah<^Or reniovaMeidftt^s?jprage^^^ (DSUs) 106 and one or 

more data cbmmunicSationi units p(pUs) iq85Ta?^isjweU known in the art. 

Each 6f the^nbdfes; 102 executes pne or piore cQmpu^^^ programs, such as a 

' Data Mining Application (APPL) liap^rforimng.<kta min^ operations, 

' Advanced Ahal^ic Processing Gomponentsj(AMC)flil2 f^^r providing advanced 

analytic procesMisg capabilities foiftheCdatc;ininin5 ppe:a^ions, and/or a Relational 

' - Database Management %ste^^(^ relational database 116 

' stPi-ed on one 'ortaibre^pf thelDSUs 106: for use in the data. minmg applications^ 

wherein various operations are perfdhned iii , the APPLl AAPC 112, and/or 

RDBMS 114 iii igspbns£-td oomma^^^ from one prfmore Clients 118. In 

alternative embod£infei:ts^tlie Al>PL HKi may be executed in one or more of the 

Clients 118, or oaW application server on a different platform attached to the 

network 104. ■'-^'^ ■ ■■•■^ ■. : uih. j;,./ . 1 iV..:i.i.; .■ 

Genfefilly, ttie computer programs are tangibly embodied in and/or 

retrieved ^^m RAMV^ROM,'one or more of the DSUs 106, and/or a remote device 

couplfeif to thie compwiter system 100 via one or more of the DCUs 108. The 

computer pfograms^comprise instructions which, when read and executed by a 
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node 102, causes the node 102 to ps^rfdrffi the steps necessary to execute the steps or 

elerheiitsbf the^^ - ■ * ^ ^ ' - 

Those skilled in the art will recognize that the exemplary environment 
illusti^at^'^ iA'^FIG: 1 is riot intencled to limit the present; invention. Indeed, those 
skiliki in the ait will recogiiize that other alternative hardware environments may 
be used without departing froni the scope of the p::esent invention. In addition, it 
should be understood that^the:pj:*ei^ent invention may also^apply to other computer 
programs than' those cfisdbsed herein. - c - r : . . 



10 ^ ;I : ^ q>r>SIGAL ARCHTTESTURE 

' ' ' FIG. 2 is a blotk dikgraiivthk illu^rates ah exemplary logical architecture of 
the i^'e 112, ^d^its interacfcicm^^^ RDBMS 114, relational 

" ' database i 16, and Clieiit 118, accordfogit^ the preferred embodiment of the present 

' irivfeitibhV W t^^^^^^ embodisiehtv ths AAPG 112 includes the follow^^ 

15 Cbnipon^nts;: ' ' - :i . nz^r:.. \ t 

- • i ^ Ari Analj^ic Lb^2:aLData 200 that.stores results from 
- ■ ^ ^' " ' ' the advanced afialytic processing in the RDBMS 114, . r, 
i ■ ^/ ^ One or nib re Scalable -Data%Mining Functipus 202 that ; 

iA coriiptise complex, opfitnized SQL statements that perform 
^0' ' - V la^^^nSed'analytic^proeessing in the-RDBMS ;114, 

-i"^ i ^ 3 Ah^ Ah^yfie ^Appli<^tibhrPrbgramming Interface: (APp 204. that 
: r u/^ . ^i/ribivcTc; plro\^des^a^^c;haflismd^ ^PL llOi or other component to 

i i^^k ci:;\b;v*^ • iiiHrgS^th^ScMabteDatajMi^ v - 

i cnc. J-b LxTii iu/r; Ofikibf ^J^^M^rHdM^brith^^ 

23cnv7»:7r jlq rr. ;c r: ^ir ^p^iJcatiQ^^ br^cttsi be"^lnvbked j>y another comppnent,'/wh^ the 
^o^ >r ; . ; :: yf, AiKdytic^g6ritIimsi20b^oqnq)rise^^^ v - > : ol.-^c-:* 

^ ' ^ ^^ EirtendedAJSISJ^SQL 208 that used tQ implement a 

5t )i - o ; : certain class^bf Ariilytic Alge^ith^^ 
^ • ' ^ > ^ - b • A Call Level Interface (CU) 21Q that can bemused when a 

30 combination of SQL and programmatic iteration is required 

to implement a certaiu class of Analytic _^lgorithms 206, and 
« A Data Reduction Lltility Program 212 that c^ be used to 
' ■ implement a certain class of Analytic Algorithms 206 where 
^ i- data is first reduced using SQLsfollowed by .programmatic 
35 iteration. 
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An Analytic AJgoritHifft iVpplication ?ro%T^wxns Interface (AP^ 
r>r; v ■ ? > /214 that provides a. mechanism f^^^ , . 

: i . .. -,; . /components to 'myolie-%hs'J*ini^^ r,-, .- , ; • 

• A Parallel Deployer 216 that controls, parallel exeeutic^s pf the 
. i ; ' . . , .results of an AnalyticiAlgGrithni.206 (sometimes refefred to as an 
■ • . - 'analytic model) that , are^ stored in the 
i V! J. • . the resiilts of executing the ParalH Deployer 216, are rt^^^ 

V, .i^o.-j !.:i..;iUDBMS;114.'i .Vi -, v o ■^v^- -. :db:^-r«::. •■ ■ - .v : . v. — 
' ? ! . .Note thit the use of these various icompousnts, js .optipnd^^^^ 
some ofthe components may be used in any. particulaiv^^ \ 

- The preferred embodiment is j3p.?nted towards^ 
architecture, in which a GhentDll8iiiiterapts,with thevmo^^^ 
above, which, in^tumy^interface to the RIPBMS ,1 14 to utilize. a Jarge^ceniy^l , 
repository of fenterprise data stored iinvthferiel^iaQnal ^ab^se. J 16 -for analytic 
processifig^i ■ '■; ' * ■ . i ' ■; : - v'^-:- s j-] ? urvn.;. j-- r. ■ i . c r:rr.; t.^ v ;:^ 

In one example, a CUent 118 imer^e^^ i^Rith an APPL 110, ^hi5h in;(:erfaces 
to the Analytic API 204 to invoke one or more of the Scalable Data Mining 
Functions 202, which are executed by the RPBl^SJM; fTIie resujts-feom the 
execution of the ScalableX>ataMiidng ^UnptiQ^L^^2Q2J,vwuld as an analytic 

mod^lwithin an Analytic LDM,20.0in,th.e;RI>BMS aj r ; v ■ - . . k' ,; . 

' In another example, a Glieiit, Its ^i^ntoiiacr^ 
■ Algorithms '206'either directlyior via_f he Ali4ytio,Algo^?hm A^PI , jTh? 
' Adilyti6 Algorithms 206 compVise SQh statQmerits [that, may pr^m^^ 
progrkuiniaud iteration, and'the SQL :S»tQ^ l^"*- 
In additidai theu%ialydc!AJgorithms>206Lmy^pr; t^^ not , interface to the Analy^ip 
API 204 to invoke one or more of the Scalable Data Mining FynptJo^s 202, which 
are executed'by'the RDBMS 114. Pvftsardl&s.^the resv^^ execution of the 

Analytic Algorithms 206- would be , stored a§. an analytic model within an Analytic 
LDM 200 in theRDBMS M4. ■l-.vi:-]? h. . .. , r^^^,,. , 

In yet another example, a Client 118 interacts with the Parallel Deployer 
216, which-iiivokes parallel instances of the results of the Analytic Algorithms 206, 
sometimes-referred:to as an Analytic Model. The Analytic Model is stored in the 
Analytic LDM 200 as a result of -executing an instance of the Analytic Algorithms 
• 206: The results of executing the Parallel Deployer 216 are stored in the RDBMS 
114. 
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r - . stiiliibthfer diakiipleVa llS^ntei^cts witbthe APPL 110, which 

invokes oile dr'friore Analytic Algorifhiii^ directly bi' via the Analytic 

Algorithm API 2l4." The riesillis woidd be^^^^^ as an analytic model within an 
A^dl>^ifcil)M 20^ y- ; f : ■ :l 

5 The' overall goal is to sigiiiScatntly improve the performance, efficiency, and 

scalability of data mining opefatioiis'by performing compiite and/ or I/O intensive 
* operations in^Ee various'coir*poneiits. The preferred emhodinient achieves this not 
only through the parallelism provided by the MPP computer system 100, but also 
from reducing the ambiirit "of data that flows betW^een the APPL: 110^ AAPC 112, 
10 RDBMS 114, CUenit-llS^^aiid btfe f ' - 

^ Those skiUed in the art will recognize that the exemplary configurations 
' illustrated irid discussed in coiijuhctSdri W^ 2 are not intended to Umit the 

present invfentioii/ Iitdeed, those-'ikill^cJ in^the art wilLrecognize that other 
afteriiative cbiiligurMbriS n^^ departing from the sqope of the 

15 present invention. In addition, it should be xmderstood that the present invention 
" ^ hiay ^sb kpply to other conip6nents than those disclosed herein. 

" Scalable Data Miriing^Functicffe ( : r : ' kLv :: " u • - ; 

v"^ The Scalable Dati^MMng lPttncti 202 coimprise complex^ 

20 optimized SQL statement's^ tKk ai^^ dreSt^^^ by 
piraihfef enziiig'^and' initaiitiatiMgrthe^ Gdrresp>onding Analytic APIs 204. The 
Scalable -©dtk^^M^^^ Jrerform much of the advanced analytic 

' piiot^ssing f or d^ra -miniifg ^p^^^^ xhe jPvDBMS 

f i ; ^'1 '^i^fiou¥ ha / move dats iironi )therrelatianal database. 41 6,.. ^ 

23 " " ' The S^able Dati Mihirig Futlctions 202 c^n be; categbrizedtbyithe ; , f c 
fbllowin^fuhctibns:'''"'^^ -jrr5^.:/>r r ^.^rr: -c- - j - : 

? : o T.;,^ : : i Data Descripdon :^ T^i<^ abihtv to undersm and describe the 
: ; V> ^- ' availiable data using statistical techniqiSs-Si: For example, the < 

generation of descriptive statistics, frequ'^iicies and/ or histogram 
3q1', , bins. ' -'^ ' ' '^"'^c^i -n-.,.: 

' Data Derivation : The ability to^enerate new variables 

(transformations) based upon/existing detailed data when designing 
an analytic model: For example j the generatioti^of predictive/ 
variables such as bitmaps, ranges, rcodes and, matheniatical functions. 
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■ V ' • Uiti Reanctioii r -Thfe^bilitTf o reduge tbe ntambeyApf variables 

■ (coBrtins) or 6bsifciVajicm .\r6ws> v^^^ 
;r,.: thodel. For exampxe;:Greaiiifg Goyariance, Correlation, or Surn of 

■ ' Squares and Grosis-Produdsi (SSCP),Matriqes,^; r ^ , , ~ . ; , 
' ; - Data Reor^inizatioh ; The ability taioin or denormalize ;pre- ,. 

- ~ 'processed results into a wide, analytic data set. r ■« \ 

- :;i ;i Data Sampl in g/Portioning : The ability to intelligently. request 
■ ' - ' diiftVrerit data s 

...•:!' ..■:..■;.•;: - ' 'i^^itioikn^g oi'^data^siampling;; ? vv ^ V ; 

a r ;. . ^j-j^^ principal theiiie of the* Scdable Data Mining Functions 202 is to 
facilitate analytic operations within the RDBMS 114, which process data collections 
stored in the database 116 and produce results that also a^e.: st9^^4: in database 
116. Since data ininiuy 6'pcratidnij^feMd.to b6 iterati^^ the database 

116 in the preferred efhbodimeat dcll^^ 

envirohmkiit. As such, a sequence of dati mining op£:rations :is^ .i^^^^^ of 
stWjps"tiiat start -s^th some collectibja feftables) ?! the database 116,r;generat^. a series 
of intermediate work tables, and finidiy^prodiice a.result table or yiew. . : , 

ArialVtic Algorithm^ ^ ^■OSr--;-'., ;-r!r, ;;-,j!v;£ {?■;.■■■..- i'- ■ 

f he'Arialytic'Algbrithji^'20orpt statisi!ir;^iaEnd."mactuEf leai?ii^ 
methods to create Analytic: 'IJ)Mi3 200 fimii.dte datj^^ residiag, in t|ie;r?ktional 
d^ta'^ase -li^; Analytic Algbr^tL Ms 203 thajrai-c complet^ly.data d^^ as 

' ySciatib^;-*^ be ihi^lemdfit^ ;AJ^^I:SQL;2Q8<..Aji3l)r«c 
Ai^orithi«s'206- th^t re'quire '^cc:ciMMa£:^> SQL ^tind ^>r9gr3^madc;iTeraa 

' 'sucH^^ iriduction>cin' be imiileMbMediiisi^^^^ the: GLI 210c ,;FiW;ly, .^.alytic 
Algomixms^2bfe'tkit Require' almost ebmplete^ itefation, such as 

clustering, can be implemented using a Data "Reduction Utility Pros:ram, 212, 
wherein this approach involves data pre-processing that reduces the amount of data 
that a non-SQL algorithm can then process. , . 4 . , 

' The Analyab'Algorithiiiii '206 significantly improve the performance and 
effici^hcy of data minihg'opeiyibas by providing the technology components to 
perform adV^ced ahdytic i6i)erali<)hs directly against.the RDBMS 114. In addition, 
the Anaiyieic AlgoriUlms 206 leverage the parallelism that exists in the MPP 
cbmputer system 100, the RDBMS 114j and the database;:l>16^ . - 
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■ • The Aniiytic Mgbrkhms 206 pf.T^ride data s^siysts -^itK an unprecedented 
"Sjition to train-iM ai>]ply "imachine learning", analytics against riiassive amounts of 
data-in the reljitional^databastf 1 ISI rPrior techi^iques have failed as their sequential 
design is not optimal in ali'RDBMS ,114 environment. Because the Analytic 
Algbiithins 206 are iriiplemented ih'Extfend?d ANS^ SQL 208,^ through the CLL . 
210, and/or by means of the Data Reduction^ Utility Program 2 12, they can 
therefore leverage the scSlMty availablie on the ^p>P po?^^^ system 100. In 
addition, taking a data-driven approach to analysis, through th? use of complete 
Extended ANSI SQL 208, allows people other than highly educated statisticians to 
leverage^he advanced afialytic techniques offered by the Analytij: .AJgp"^"" 206. 

' ^ ' Extended ANSI SOL ■ ^..b: u :• i a.-' > -■' 

■ ■' As' rAdntitoned' above. Analytic.! Algorithras 206 tljat are completely dap 
driven, such ai affinity analysis, cm be i^ ^O^Iy in .Extended ANSI SQL 
208. Ty|)ically, these type of algbrithins operate against a. set pf t^fjle? jn the . 
relational database 1 16 that' are pGplikted ^ith transaction^le^fel data^ ^lie source of 
which could be point^f^de devibes^ autoniated te^ naachines, 9^ centers, the 
Internet, etc. The SQL statements used to process this data typically btiild 
relationships between and among data elements in.;the tables. Fqr example, the 

? SQL statements used to process data from :|ioi^tTo£-sal.e devic^s.niay build 

relatiohships bttwteteand ainong^prftdiKns and^paif s of prpdupts. Additipnally, the 
' diinerisiofi'pf iim^cii^ he added in<sucla:a way liiat tjiese, relatiom^ips can be 
-^^Mkiyled io'deterMfie 'how 

■ iia SQL stateiHe^fcs; theidesigikJta^ the hard;vifare ^4.9.9.ftware 

eiivirdntedLt- of the' preferred eriJ^mimX kYt<^^'^9^'^^'^^}k^ §^^L ftat^n^e^t*^.; 

' irito a ^toraliiy of sort and merge st^ps,t|i^t: can be exepiied,copc^crra^^^ parallel 
by theMPP computer system lOOir, J i, cck- . = v:,-:,/^ : - i , . . 

Call-Level Interface c ;Af;fI, rrj.-, -,'/J ho:!i- 1.. 

As mentioned above. Analytic Algpiithrps 206 thatrequire a mix of 
programmatic iteration along with Extended; ,^SI<SQL statepii^nts, ^uch as 
inductive ikfereh^e, can be implemented using the CLI 210. .^hereas the SQL 
approach is apprOpriatef or business problems that ar^ 4escriptiye in nature, 
inference problems;are predictive in nature and: typically rfiquire a training phase 
where the APPL 110 "learns" various rules based upon the data description. 
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followed by testirigjand appUcation, suad.T^hpre^the.rpl^ ^e vdicbted a^^^ 
against a new data set; TKis cjass ojff^gorltb^is are^c^^ ^ , 

historically 'can not handle large . volumes, of, daw became, analyzed 
dsta to be in a specific fixed; or vai^abl^ i !. w 

■ Most implementations first extract the.(kta fro^ the dajabase 11^^^ 
construct a flat file and then execute . the, "train" portion on this resultant file. This 
mfethod is slow and limited by the^arnount of mei^iory availably m ^he con^puter 
system lOdi This process canN,beJmpr^c)yed by lev^rag database 116 

to perform those portions of the analysis, instead, pf^ejara^ing all t^^^ data. 

When SQL statements and programmatic iteration are med together, the 
RDBMS 1 14 can be leveraged to perf oi:msCO]9aputation§ and order dat^a within the 
relational database 116V and then ^ract the inforpiadon .usi»g,y^ memory 
in the APP£ Additionally, ppmputatioiis, aggregations. and//?.i; o^d^^ be 
-nin in parallel, because of the ma£.siyely} parallel namr^ pf;tlie ^BT^S 114. 

J ■■ vt)ataTReduction TTrnity Pfopramr H Z.zM !j.jn 

- . - As irientioned ab<3ve, .Aji.alytic Algorith?i!s.2q6 that can operat^, on a 
or scaled data set, such as regression or clustering, the Data Reductip^^Utility 
Pt^ogram'212 can be xised^s The problem of jcreating sm^^^ mod^l^^from massive 
amounts 6f detailed data has .oftftajbefeti;ad(lj^edf^y sampling, i^ayily because 
compute intensive algorithms cai\n^Qt.hjm<fle lsMrge yp}um^ o^^^ 
of the Data Reduction: UtUity; Program^ 2jl?,b to 

such' is itiatrix calculatibns or;bislxi§rani bijinwg,;.and th^P H?? 4 P*" 

scaled data as'input to a nonrSQL a^<C»ni^,riTWs. method iHte^j^ipnally reduces 
fine riumefical data^detailis ^vassigntog thein to or bir^Sy^qprc^ating^dieir 

valuei or deteniiining tl^eir covanaiices.' Thc/Capa^ity^ of the.,prefe^red embodiment 
for crfeating these data stntctures fromjmassive amounts of data in parallel ^ves it a 
special opportunity in this area. .: L ii.., 

Analvtic Logical Data Model : ., >: 

The Analytic IJDM 2D0; which is . integrated with. the relational database 116 
and the RDBMS 114, provides, logical. entity and attribute definijions for advanced 
aftalytie processing, i.e.,rthe Scalable Data Mining Functions 202 and Analytic 
Algorithms 206, performed by the RDBMS 114 directly against the relational 
database 116. These logical entity and attribute definitions comprise metadata that 
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' ^^iifefine the chii-kcter^^ the feiational database 11^, as well, as 

metadata tlwt iifetfetimn^^ hbw tHe RDBMS 114 'perioims the advanced an^lyfic 
' processing." ThV Analytic LDM 2bo Wo storeslprocessmg' results from this... ; ; 

advanced analytic processing; wMcK^iricVtides both' resiflt^t^^^ and deriyed d^ta for 
5 the Scaiabie Data Mining'Functibns "202, Analytic Algorithms ,206^ and the Parallel 
Depioyeriie. ¥h^ Aiialytic LDM 200 is a d3aiamiciaaodeli;since the logical.entities 
and attribuffes definitions change depending Up6n parameterization of the advanced 
juialytic pro^bsingi'and since the' AhdytfcLBM 200.is:updated with the resists of 
the advanced imaiytifc' prbcess'i^^^ ' ^ "' < ^"-'i " f ' ' sr:v 

Lo gic 6f 'tfeTrefefrfed' ESnbodi^men^ ,s. ' • J - •. ' 
s iiiu^tratythe'ld^^^ 
fjresdnfinvehtibn'are JThosc skillsdi in the art will 

reco^niie thJt this logic s prov^iided for illustrative purposes only and that different 
15 logic may be used to accomplish the same results. 

Referring to FIG. 3, this flowchart illustrates the logic of the Scalable Data 
Midmg FiijK:ti6iis''202 accbfding to'the pi;eferred^embodimentr of 'the present 
'^•inv^tfeiiv ' ''-^ •'^■^:H'---:irf^ "0 

' Blbck 300 rejireseiiti ^He-one br 'n^^^^ of the Scalable Data'Mining Functions 

20 '"202 beiilg Created via thil i^i-20^v -ThiS^irii^^ for examplej the instantiation. 

' of , > . , ; ,j f: : 

- ' ■ ■ ' ' : ^ Bib<:k''30^- reprfeents c par Smetets being passed to the API 204, in 
'b^^^r'Vb' c^Jiitrbl^thl'op'ef ati6h^^ ih:€ Sealable?Data Mining Functions 2.02^ 

25 ' ' ■ 'if-n^;t:e'ssatyW thd bpefatioii df^the S,e^abie?Data MiniiigiFiirictiQn 202, 
n- - in / ^gj^^jj 306 repr^ents-the-RPI204f^en6^^^ Scalable-Data Mining.: 
" Functioi 204 in the form bf a data siiiriing qaery;based oh the pass.ed parameters 
and optional metadata. 0 ; -. ii.t- 

Block 308 represents the Scalable Data Mining Function 204 being passed to 
30 the RDBMS 114 for execution. l^- pjvl i .:. : :?J. , : ; 

' Referring tb'FIG. 4, this flowchart illSiStrates the logics of the Analytic 
Algorithim 206 according to the preferred embodiment of the. present, invention. 
Block' 400 reprtsents the Analytic Algorithms 206 being invoked, either 
' directly or via the Analytic' Algorithm API 214.'- .. . : 
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■ ' " Block ftOl i-epresents certain.parameters being p^ss^ to t^ip Andytic 
Algorithms 206, in order to control thejr operation. . , , , j,^,,. , , , 

J. L- i o Bldck 404 represents the m:etad^^ in^the M^T^^ ^}^^P'^ H^^^ accessed, 
if riecessaiy for the operation of th^ Analytic, Algorithm? 20$^, . 

' - - Block 406 represents the An^ytit, Algorithms 206 passing S^L statements 
to the RDB^iS 114ior execution and Block 408 optionally represents tfie Analytic 
Algorithms 206 perforiningi programmatic iteration. Those ?ldlled in the art will 
recb^iize that-the sequence of these steps may diffepircjm thpse described above, 
may hot ihclude bbth steps, may in^ludj? addit^||4 steps, and n^y inclutje 

iterations of these steps. 

- Bl6ck-410 represents rh*? Analytij; Alf^pnthms 206 s^qring results in the 

Analytic LDM- 200. V b;.- -i.:^.;: -i'i , , -. . .^A l--.-:^- 

Referring to iFIG. 5, this flpwchart illustrates the logic ^i^fprpied by the 

RDBMS 114 according to the prefeijrgdj€:mbpdiment:of the pr^^^ 

' Bibck 500 represents the RD3M§ < M.receis^g a query py.^th?r SC^L . 

statements. ■ r; ii;. : i^jlx;- . y,;^ , . .M ' v. ; j 

Block 502 represents the RDBMS 114 analyzing the query. 
Block 504 represents the RDBMS 114 generating a plan that enables the 

RDBMS 114 to retrieve the correct information from the relational database 116 to 

satisfy the query. 

Block 506 represents the RDBMS 114 compiling the plan into object code 
for more efficient execution by the RDBMS 114, although it could be interpreted 
rather than compiled. 

Block 508 represents the lODBMS 114 initiating execution of the plan. 

Block 510 represents the RDBMS 114 generating results from the execution 
of the plan. 

Block 512 represents the RDBMS 114 either storing the results in the 
Analytic LDM 200, or returning the results to the Analytic Algorithm 206, APPL 
110, and/or Client 118. 

CONCLUSION 
This concludes the description of the preferred embodiment of the 
invention. The following describes an alternative embodiment for accomplishing 
the same invention. Specifically, in an alternative embodiment, any type of 
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compiiter, su^h as a iikamframe,-mmiOTmp^^ person^irdmputerrcould be 
used to implement the presefat invrntidhi ^.■:^:-':^.o:' :y r,; .■ . ; . \ 

:; . . ^ini^tfy,* die preserit'Weritidn discloses 'a method; apparatus, and article 

of manufacture fdr p^tforiiiirig dati'immng in a relational datable 

5 " ' management system. *At least one anal>TiG algorithm is performed by:?a computer. 

' ' directly ig^st a relational database, Wherein the analytic algoril^hm inpludes,SQL 
statements f^rfbrmed by the rtlational database managenientsystem and optional 
pro^ranmiatic ite^itioh,- analytic algorithm creates?at.leastr'one analytic 

model '^iihiri an'aiial^id logi<^^ model from data residing in the relational 

10 database. ' 

The foregoing description: df f}ie<preferre:difimbodimenittof the invention has 
been presented for the purposes of illustration and description.! It is n.ot intended to 
be exhaustive' or' to liiiiit the invention to'che' precise f ofrm disclosed. Many 
modifications and Variations are i>bssibte:in light of th?, above teaching. It is 

15 intended that tfae'seope'Of the iii^efitidin be limited not by this detajl^ed description, 
but rather by the claims appended hereto. vj-y, -* 
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■ WHAT IS €IiAIMED-lS;:; i ...^uc r-!:.;,.- ; . sr^V 

li A coniputer^niF.leinf ^^^^^ 
.cations j^Gomprismg: ; .-. .'^ JC;r: s-r'..-.. - ii;; 



(a) a computer having one or more data storage devices connected thereto; 

(b) a relational database management system, executed by the computer, for 
managing a r^atioaal databasie stored:;Oii the data storage devices; and 

(c) at leAsti oni analytic algorithm perforine4 bjr .the 9o^j)yter, wherein the 
analytic algorithm includes SQL statements periorfla^i by tlie relat^^^^ 
management system directly against the relational database and optional 
programmatic iteration^ and the^analytic algorith^n s^^eaje? ,at,le one analytic 
modeF Within an analytic logicaljdata mc^del|from dat^ reacling in th? relational 
database.. i- •••..•.•so:- 

- i2; Thecomputerriinptoent^sjrstepiof^^^^ 
algorithih provides statistic^ and^madii^i,e:l^w:ning 
''ihalyti<? logical xiata-modeL y^" .:-(_.; /.f-. ■ -^r , 

3. The computer-implemented system of claun 1 , wherein the analytic 
algorithm is implemented in Extended ANSI SQL. 

4. The computer-iiijpleniemed system el4m 3, w^erein^^^ analytic 
-algorithm bperktes against a- set of tables ini the lielation^.^^ab^^^^ ^d the 
Extended ANSI SQL build relationships anxpng data el^n^^nts in /^he w 

.sd- 1 h \ '5= - H riThe bWputer^mple^^ 

ANSI SQL irialyzfe^ the relationship§ tptid^jei:minf how the rejarion^^ 

' ■ 6. ' The coiapTiter^iiaiplemeijted system of claim 1, wherein the analytic 
algorithm is implemented in a CaU Level Interface (CL^ that processes data from 
the relational database using SQL and programmatic iteration. 

7. The computer-implemented system of claim 6, wherein the CLI is 
used with SQL to perform computations, aggregations, and/or ordering on the data 
from the relational database. 
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8. The computer-implemented system of claim? i;; wherein the analytic 
algorithni is' ir^ by dTD^^teductionUtUity Program that reduces data 

from the relational database in bulk using SQL followed by. a nor^rS.QL iterative 
program.. . 



9. The computer-irbpfemeizted system of claim 8,v^^hte.rein the Data 
' ' Reduction^ 

by prbgramimatic ite^ ' ' - ^ ^ r: ^ . j v ; 

10 10. "A method for 'performi^&d^ 

(aymafiaging a Vaiition^^ait^^ one br.mpre data stprage devices 

connected to a computer; and 

(b) performing at least one analytic algorithm in the computer, wherein the 

' ii^alytic algorithni iii^^^ 
15 management systerii diif ec^ly ig^St the relational darabase and optional 

programmatic iteration, and the analytic algorithm creates it least one^an^^c 
model within an analytic logical data model from data residing in the relational 

database. - . . . ^ : , 



20 11. An article of manufacture comprising logic embodying a method for 

^ perfi^i^n^ data dppKcations^ r - / ^ 

' "''^ m^i^^^^ k^^^oA^dst^ on:one pr,?nor?^ data storage devices 

(b) performing at least one analytic algorithm in the computer, wherein the 
aiiaiytJ^'klgbraKm relational database 

manageiM^'systeri clirealy agiiri^ the Telational datab^^ A^id optional : . 
programmatic iteration, and the analytic algorithm creates at least one analytic 
model within an analytic l6gical<iata rnodd fr om ^ta residing in the relational 
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